Relative Entropy Policy Search
نویسندگان
چکیده
This technical report describes a cute idea of how to create new policy search approaches. It directly relates to the Natural Actor-Critic methods but allows the derivation of one shot solutions. Future work may include the application to interesting problems. 1 Problem Statement In reinforcement learning, we have an agent which is in a state s and draws actions a from a policy π. Upon an action, it received a reward r (s, a) = Rsa and transfers to a next state s′ where it will do a next action a′. In most cases, we have Markovian environments and policies, where s′ ∼ p(s′|s, a) = Ps sa and a ∼ π(a|s). The goal of all reinforcement learning methods is the maximization of the expected return J̄(π) = E {∑T t=0 r(st, at) } . (1) We are generally interested in two cases, i.e., (i) the episodic open loop case where the system is always restarted from initial state distribution p(s0), and (ii) the stationary infinite horizon case where T → ∞. Both have substantial differences in their mathematical treatment as well as their optimal solution. 1.1 Episodic Open-Loop Case In the episodic open-loop case, a distribution p(τ) over trajectories τ is assumed and a return R(τ) of a trajectory τ , both are given by p(τ) = p(s0) ∏T t=1 p(st+1|st, at)π(at|t), (2) R(τ) = ∑T t=0 r(st, at). (3) The expected return can now be given as J̄(π) = ∑ τ p(τ)R(τ). Note, that all approximations to the optimal policy depend on the initial state distribution p(s0). This case has been predominant in our previous work.
منابع مشابه
Online learning in episodic Markovian decision processes by relative entropy policy search
We study the problem of online learning in finite episodic Markov decision processes (MDPs) where the loss function is allowed to change between episodes. The natural performance measure in this learning problem is the regret defined as the difference between the total loss of the best stationary policy and the total loss suffered by the learner. We assume that the learner is given access to a ...
متن کاملLearning to Serve and Bounce a Ball
In this paper we investigate learning the tasks of ball serving and ball bouncing. These tasks display characteristics which are common in a variety of motor skills. To learn the required motor skills for these tasks the robot uses Relative Entropy Policy Search which is a state of the art method in Policy Search Reinforcement Learning. Our experiments show that REPS does not only converge cons...
متن کاملTwenty Questions for Localizing Multiple Objects by Counting: Bayes Optimal Policies for Entropy Loss
We consider the problem of twenty questions with noiseless answers, in which we aim to locate multiple objects by querying the number of objects in each of a sequence of chosen sets. We assume a joint Bayesian prior density on the locations of the objects and seek to choose the sets queried to minimize the expected entropy of the Bayesian posterior distribution after a fixed number of questions...
متن کاملSome properties of the parametric relative operator entropy
The notion of entropy was introduced by Clausius in 1850, and some of the main steps towards the consolidation of the concept were taken by Boltzmann and Gibbs. Since then several extensions and reformulations have been developed in various disciplines with motivations and applications in different subjects, such as statistical mechanics, information theory, and dynamical systems. Fujii and Kam...
متن کاملPolicy Search with High-Dimensional Context Variables
Direct contextual policy search methods learn to improve policy parameters and simultaneously generalize these parameters to different context or task variables. However, learning from high-dimensional context variables, such as camera images, is still a prominent problem in many real-world tasks. A naive application of unsupervised dimensionality reduction methods to the context variables, suc...
متن کاملObservational Modeling of the Kolmogorov-Sinai Entropy
In this paper, Kolmogorov-Sinai entropy is studied using mathematical modeling of an observer $ Theta $. The relative entropy of a sub-$ sigma_Theta $-algebra having finite atoms is defined and then the ergodic properties of relative semi-dynamical systems are investigated. Also, a relative version of Kolmogorov-Sinai theorem is given. Finally, it is proved that the relative entropy of a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010